Efficiently Indexing High-Dimensional Data Spaces

نویسنده

  • Christian Böhm
چکیده

Indexing high-dimensional data spaces is an emerging research domain. It gains increasing importance by the need to support modern applications by powerful search tools. In the so-called non-standard applications of database systems such as multimedia, CAD, molecular biology, medical imaging, time series processing and many others, similarity search in large data sets is required as a basic functionality. A technique widely applied for similarity search is the so-called feature transformation, where important properties of the database objects are transformed into points of a multidimensional vector space, the so-called feature vectors. Thus, similarity queries are naturally translated into neighborhood queries in the feature space. In order to achieve a high performance in query processing, multidimensional index structures are used to manage the feature vectors. Unfortunately, multidimensional index structures deteriorate in performance when the dimension of the data space increases, because they are primarily designed for low-dimensional data spaces and due to a bunch of effects usually called the ‘curse of dimensionality’. The general goal of this thesis is therefore the improvement of the efficiency of indexbased query processing in high-dimensional data spaces. For this purpose, a cost model for index-based query processing in high-dimensional data spaces was developed. It is applicable to a variety of index structures and query processing techniques and can be used for the evaluation of techniques and for optimization. Based on this cost model, a variety of improvement and optimization techniques for multidimensional index structures was developed. The first, called DABS-tree, involves a cost model based split algorithm supporting a dynamic and local adaptation of the block size of the index structure. Dynamic block size adaptation is especially useful as we can show that conventional index structures often access data in too small portions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

یک روش مبتنی بر خوشه‌بندی سلسله‌مراتبی تقسیم‌کننده جهت شاخص‌گذاری اطلاعات تصویری

It is conventional to use multi-dimensional indexing structures to accelerate search operations in content-based image retrieval systems. Many efforts have been done in order to develop multi-dimensional indexing structures so far. In most practical applications of image retrieval, high-dimensional feature vectors are required, but current multi-dimensional indexing structures lose their effici...

متن کامل

SIMP: Accurate and Efficient Near Neighbor Search in Very High Dimensional Spaces

Near neighbor search in very high dimensional spaces is useful in many applications. Existing techniques solve this problem efficiently only for the approximate case. These solutions are designed to solve r-near neighbor queries only for a fixed query range or a set of query ranges with probabilistic guarantees and then, extended for nearest neighbor queries. Solutions supporting a set of query...

متن کامل

Improving the Performance of High-Dimensional kNN Retrieval through Localized Dataspace Segmentation and Hybrid Indexing

Efficient data indexing and nearest neighbor retrieval are challenging tasks in high-dimensional spaces. This work builds upon our previous analyses of iDistance partitioning strategies to develop the backbone of a new indexing method using a heuristic-guided hybrid index that further segments congested areas of the dataspace to improve overall performance for exact k-nearest neighbor (kNN) que...

متن کامل

Retrieval of Optimal Subspace Clusters Set for an Effective Similarity Search in a High-Dimensional Spaces

High dimensional data is often analysed resorting to its distribution properties in subspaces. Subspace clustering is a powerfull method for elicication of high dimensional data features. The result of subspace clustering can be an essential base for building indexing structures and further data search. However, a high number of subspaces and data instances can conceal a high number of subspace...

متن کامل

Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces

Many emerging application domains require database systems to support efficient access over highly multidimensional datasets. The current state-of-the-art technique to indexing high dimensional data is to first reduce the dimensionality of the data using Principal Component Analysis and then indexing the reduceddimensionality space using a multidimensional index structure. The above technique, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998